-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try reconnecting to SHiP again when read fails. #614
Conversation
// Clearly this approach will introduce overhaad, but since this piece of code | ||
// will only execute during recovery, it should be fine. | ||
if (head_header->number > 250) { | ||
start_from -= 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is trivial to keep track of LIB which would be better than just guessing at a block number. See block.last_irreversible.block_num
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means we need a set of functions to query single block etc.
And there's a corner case:
FORK ------ A (origin HEAD)
| -LIB--------A' (new block at same height)
In the extreme case that LIB already pass the fork block when we try to restart, LIB is not enough to recover.
stream->binary(true); | ||
stream->read_message_max(0x1ull << 36); | ||
connect_stream(); | ||
initial_read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would refactor this and the code from init()
into a common method.
// Any other error will still result in exit. | ||
SILK_INFO << "Trying to recover from SHiP read failure."; | ||
// Wait for a while before doing anything in case we hit some network jam. | ||
std::this_thread::sleep_for(std::chrono::seconds(3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 seconds seems like a rather long time.
abi = load_abi(eosio::json_token_stream{(char*)buff.data().data()}); | ||
auto end = buff.prepare(1); | ||
((char *)end.data())[0] = '\0'; | ||
buff.commit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems rather odd that it would not always contain the terminating character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am not sure whether we should fix it like this or not.
I don't think boost websocket scream will write a zero there. So maybe we should fix json_token_stream.
But I am not sure if it's a good idea to touch cdt for this issue..
After some discussion, we will have some major logic changes, so close this PR. |
Fix #579 #583
Some behavior may subject to discussion:
1 Only retry if failed to read. that is, failed to send requests or failed to generate block will not trigger retry.
Logic here: majority of the recoverable (by reconnect) fails happen here and the reconnect logic is simple there. Supporting reconnect during other steps would be much more complex and the gain is marginal.
2 When reconnect to SHiP, start from 250s before canonical HEAD (if possible)
Logic here:
a This approach is simple. No moving parts at all.
b It is well tested that the code can start from a certain early block.
c The block 250s before canonical HEAD must have been irreversible already.
(250s maybe too much, we can change that)